
CLAIMS 

What is claimed is: 

1 . A method of reducing warranty costs, comprising: 

discriminating between a hardware-induced problem or outage and a 
software-induced problem or outage is a computer system. 

2. The method of claim 1, further comprising: 

periodically storing indicators of system software and hardware health 
prior to the problem or outage. 

3. The method of claim 2, further comprising: 

after the problem or outage, analyzing those indicators to determine 
whether the problem or outage was due to hardware or software. 

4. The method of claim 3, further comprising: 

presenting information regarding a cause of the problem or outage to a 
user of the computer system to prevent an unnecessary service call and 
hardware replacement. 

5. The method of claim 1, further comprising: 

depending upon said determining of said hardware-induced problem or 
outage or said software-induced problem or outage, determining a 
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manufacturer of said hardware or said software having undergone said 
problem or said outage. 

6. The method of claim 1, wherein, in event of an outage of one of said 
hardware and software, pre-outage data is stored in a log file across the 
outage. 

7. A method of reducing warranty costs associated with a computer system, 
comprising: 

detecting a lack of performance of said computer system; and 
discriminating whether said lack of performance was caused by a 
hardware-induced problem or a software-induced problem. 

8. The method of claim 7, further comprising: 

gathering pre-lack of performance data, said discriminating being 
performed based on said pre-lack of performance data. 

9. The method of claim 7, further comprising: 
recovering from said lack of performance. 

10. The method of claim 8, wherein said lack of performance comprises an 
outage, and in event of an outage of one of said hardware and software, said 
pre-outage data is stored across the outage. 
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11. A method of reducing warranty costs, comprising: 
discriminating between a hardware-induced problem or outage and a 

software-induced problem or outage in a computer system; and 

based on said discriminating, reducing a duration of a service call and 
ensuring that a service technician has a correct part on hand at a time of repair. 

12. The method of claim 1 1, further comprising: 

periodically storing indicators of system software and hardware health 
prior to the problem or outage. 

13. The method of claim 12, further comprising: 

after the problem or outage, analyzing the indicators to determine whether 
the problem or outage was due to hardware or software and which hardware or 
software subsystem was most likely a cause of the outage, and to produce 
information. 

14. The method of claim 13, further comprising: 

presenting the information to a service technician of a computer system to 
replace or repair a faulty subsystem. 

15. A method of reducing a trouble-shooting cost in a computer system, 
comprising: 

sampling system health data from a plurality of sources, and storing said 
data in a log; 
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determining whether an outage event has occurred; and 
based on whether an outage event occurs, analyzing logged and other data 
to judge a likely cause of the event. 

16. The method of claim 15, wherein if the event is judged to be due to 
software, determining whether automatic recovery is possible, and if so, 
invoking an automatic recovery mechanism and notifying a customer or field 
support personnel that a software problem is the cause of the event, and 
identifying a faulty subsystem for subsequent troubleshooting. 

1 7. The method of claim 15, wherein if the event is judged to be due to 
software, determining whether automatic recovery is possible, and if not, then 
indicating that the event is due to software, and is not automatically 
recoverable, and notifying a customer or service technician to manually 
recover the fault. 

18. The method of claim 15, further comprising: 

determining whether the event is a software fault and if not, then 
determining whether a diagnosable hardware fault exists. 

19. The method of claim 18, further comprising: 

if the event is judged to be caused by hardware, then examining at least 
one of a hardware error log, an error register, and a hardware diagnostic, and 
attempting to localize a replaceable component that caused the event; 
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informing a customer or a service technician that the outage was due to 
hardware; and 

manually recovering the hardware by replacing only defective hardware. 

20. A computer node associated with a computer system, comprising: 
hardware for executing an operating system, at least one application 

program, and a system health monitoring program, 

wherein said system health monitoring program gathers system software 
and hardware health data from an application program, an operating system, 
and the hardware, and discriminates a cause of an event comprising at least 
one of a problem or outage of said computer node. 

21 . The computer node of claim 20, wherein said computer node includes 
sources of information for assessing software and hardware health. 

22. The computer node of claim 21, wherein said information is measured 
and logged prior to a failure event, 

said system health monitoring program monitors at least one of resource 
consumption data, system and application software error logs, system 
utilization and performance data, and software error counts. 

23. The computer node of claim 20, wherein said system health monitoring 
program monitors at least one of concurrent diagnostics, hardware error logs, 
and hardware error counts, and 
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wherein said system health monitoring program gathers information after 
the event, including at least one of error logs, crash dumps of memory, error 
codes, offline or power-on hardware diagnostics, and hardware error registers. 

24. The computer node of claim 20, wherein said system health monitoring 
program includes a log device for permanently storing a time history of system 
software and hardware health data, said log device being readable after an 
event to determine a likely cause of the event. 

25. The computer node of claim 20, wherein said system health monitoring 
program includes an analyzer for analyzing the software and hardware health 
data. 

26. The computer node of claim 25, wherein said analyzer is run on the 
computer system that has experienced a problem, or on another execution 
environment. 

27. The computer node of claim 20, wherein said system health monitoring 
program includes a notifier for notifying a customer or field service support 
personnel regarding a cause of the outage or problem, whether a service call is 
necessary, and where the likely cause of the outage or problem resides. 
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28. The computer node of claim 20, wherein said system health monitoring 
program samples a plurality of parameters, said plurality of parameters 
including at least one of: 

a parameter indicating a number of bytes that must be kept in physical 
memory and cannot be paged out to disk; 

a parameter indicating a number of bytes that reside in said physical 
memory plus the paging files; 

a parameter indicating a number of errors that have been reported by 
transmission control protocol (TCP)/Internet Protocol (IP) software; and 

a parameter indicating whether said TCP errors are accompanied by 
Network Adapter Errors. 



29. A system for use with a computer system, comprising: 
an outage detector for detecting and outage; 
a memory for storing pre-outage data of the system; and 
a discriminator for discriminating whether said outage was caused by a 

hardware component or a software component of said system. 



30. The system according to claim 29, wherein, in event of an outage of one 
of said hardware and software, said pre-outage data is stored across the outage. 



3 1 . A signal-bearing medium tangibly embodying a program of 
machine-readable instructions executable by a digital processing apparatus to 
perform a method for reducing warranty costs, said method comprising: 
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discriminating between a hardware-induced problem or outage and a 
software-induced problem or outage in a computer system. 



8 V 

5 



a 



YOR920010068US1 



